Random Effects, Mixed Effects, & Heirarchical/Multilevel Models

11/18/2024

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for fair use for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Motivation for Heirarchical/Multilevel/Mixed Effects Models
    • Theory behind random effects in regression models
    • Examples in R

Heirarchical/Multilevel Data

Recall our MLR Model assumptions: \( E(Y) = \mathbf{X}\boldsymbol\beta \), and \( \epsilon_i \) are all independent with mean 0 and variance \( \sigma^2 \).

Frequently, we encounter data sets with additional structure, e.g.,

  • Data collected from subplots within multiple non-adjacent and different sites, scattered across different countries.
  • Multiple measurements from single individuals under different conditions, etc.

Think of such data as having 1 or more levels of group structure, with variation in within-group regression parameters.

  • We'd like to estimate those within- and across-group regression parameters.

Mathematically, we might posit a regression model for individual \( i \) (who belongs to group \( j \)) as

  • \( Y_{i} = \beta_{0j} + \sum_{k=1}^p(\beta_{kj})X_{ki}+\epsilon_{i} \), but it's better to quantify the average \( \beta \) values, e.g.,
  • \( Y_{i} = \beta_0+\beta_{0j} + \sum_{k=1}^p(\beta_k+\beta_{kj})X_{ki}+\epsilon_i \), where \( \epsilon_i \) are all independent with mean 0 and variance \( \sigma^2 \).

Group \( j \) regression parameters are the sum of the population-level estimate and group-level estimated deviation: \( \beta_{k}+\beta_{kj} \).

  • Example: Group \( 3 \) has an intercept value of \( \beta_0+\beta_{03} \).

Notice we haven't made any assumptions yet about whether any relationships exist (or not) between these group-level deviations from the population level parameters. We'll explore some options shortly.

Heirarchical/Multilevel Data Example: Pig growth data

These data (from the cv package) track the weight of 48 individual pigs over 9 weeks (weight in kilograms).

library(cv) # Cross Validation package
head(Pigs)   
  id week weight
1  1    1   24.0
2  1    2   32.0
3  1    3   39.0
4  1    4   42.5
5  1    5   48.0
6  1    6   54.5
summary(Pigs)  
       id             week       weight     
 Min.   : 1.00   Min.   :1   Min.   :20.00  
 1st Qu.:12.75   1st Qu.:3   1st Qu.:37.00  
 Median :24.50   Median :5   Median :50.50  
 Mean   :24.50   Mean   :5   Mean   :50.41  
 3rd Qu.:36.25   3rd Qu.:7   3rd Qu.:63.50  
 Max.   :48.00   Max.   :9   Max.   :88.00  

Source: Data were provided by Dr. Philip McCloud of Monash University, Melbourne, to the authors of the book Analysis of longitudinal data, by Diggle, Liang, & Zeger (1994). https://doi.org/10.1093/oso/9780198524847.001.0001.

Heirarchical/Multilevel Data Example: Pig growth data

plot(weight ~ week, data = Pigs, type = "n")
for (i in 1:max(Pigs$id)) { with(Pigs, 
    lines(1:9, Pigs[id == i, "weight"], type="o", pch=19, col=rainbow(48)[i])
)}
abline(lm(weight ~ week, data = Pigs), lwd = 4)
legend('topleft',"Overall best fit line", lty=1, lwd=4)

Q: Do the individual pigs look like their growth curves have the same slopes and intercepts?

Regression Modelling Options?


Goal: Quantify the variability in regression parameters at different levels of the data heirarchy.

Option 1: One (unwise) approach to modelling data like these is to just ignore the individual differences and pretend all individuals are independently and identically distributed.

  • We would have a sample size of \( n=48\times 9=432 \)
  • Relatively strong statistical power (e.g., small standard errors on \( \beta \) estimates).
  • Our parameter estimates would be simpler to interpret, but based on flawed assumptions.

Option 2: Another (unwise) approach is to estimate separate regression parameters for each pig (e.g., using interaction terms)

  • This prevents data from one pig influencing the estimates of another.
  • This gives each regression a sample size of only 9
  • We'd be estimating 48 \( \times \) 2 parameters plus 48 population standard deviations!
  • This will cost us statistical power! (e.g., we'll have broader standard errors on \( \beta \) estimates)
  • We won't be able to speak directly about the population-level (average) slope (limitation for hypothesis testing, etc.)


Heirarchical models, like linear mixed effects models, try and strike a balance between these two options.

Random & Mixed Effects in Linear Models


Recall our tentative model from a few slides back, for individual \( i \) in group \( j \): \[ Y_{i} = \beta_0+\beta_{0j} + \sum_{k=1}^p(\beta_k+\beta_{kj})X_{ki}+\epsilon_i \] LME Assumption: group-level parameter deviations (\( \beta_{kj} \)) are randomly sampled from a Normal(\( \beta_k \), \( \sigma_{\beta_k} \)) distribution.

  • Fixed effects refer to the population-level means \( \beta_k \), and
  • random effects are the group-level deviations about that population level parameter (denoted \( \beta_{kj} \))
    • Regression parameters for a given group are the sum of the fixed effect and the random effect.
  • Mixed effects models are models with fixed and random effects.

Some important differences between this approach and option #2 above:

  • Group-level parameter estimates don't count against our degrees of freedom. We maintain most of our statistical power!
    • Ex: instead of estimating 48 intercepts from the pig data, we're estimating the 2 parameters \( \beta_0 \) and \( \sigma_{\beta_0} \).
  • We still obtain group-level regression parameter estimates, BUT, they're constrained by our distributional assumption.
    • Each pig's regression estimates are biased by the other data, due to the Gaussian constraint
    • Estimates will be similar to the option #1 (pig-by-pig) estimates, but, they will be more narrowly dispersed around the mean slope (we call this shrinkage; for more see https://m-clark.github.io/posts/2019-05-14-shrinkage-in-mixed-models/ and Ch.5 in Zuur et al)

Overview for Random & Mixed Effects in Linear Models

Random effects (intercept only) model with no predictors, where we'll deviate from the notation used in Zuur et al (2009) and index individuals as the \( i^{th} \) individual in group \( j \):

\[ Y_{ij} = \beta_0 + \beta_{0j} + \epsilon_{ij} \]

Assumptions:

  • The \( \beta_{0j} \) values are normally distributed with mean 0 and variance \( \sigma_0^2 \)
  • \( \epsilon_{ij} \) are all independent, mean zero, standard deviation \( \sigma \) (GLS-like generalizations exist).
  • Parameters to estimate: \( \beta_0 \), \( \sigma_0 \), \( \sigma \).
    • Parameter estimation by restricted maximum likelihood (REML) or ML.

Random intercept model Mixed effect for intercept, fixed effect for one predictor \( X \) (just one, for simplicity):
\[ Y_{ij} = \beta_0+\beta_{0j} + \beta_1 \, X_{ij} + \epsilon_{ij} \]

  • Parameters to estimate: \( \beta_0 \), \( \sigma_0 \), \( \beta_1 \), \( \sigma \).

Random intercept and slope model with mixed effects for both slope and intercept:
\[ Y_{ij} = \beta_0+\beta_{0j} + (\beta_1+\beta_{1j})X_{ij}+\epsilon_{ij} \]

  • the \( \beta_{kj} \) values are normally distributed with mean 0 and variance \( \sigma_k^2 \)
  • Parameters to estimate: \( \beta_0 \), \( \sigma_0 \), \( \beta_1 \), \( \sigma_1 \), \( \sigma \).

Example with simulated data

Let's simulate some data and explore the use of the tools in lme4.

# Simulate data with a random effect for the slope on predictor X
set.seed(757)

Ngroups = 7; N=20 # per group
b0=50 # fixed, not random, intercept; 
mean.b1=2.5 # random slope
sd.b1=1
b1=sort(rnorm(Ngroups, mean.b1, sd.b1))
b1 # random slopes to use for each Group
[1] 1.595831 1.729731 1.900448 2.034316 2.285474 2.700080 3.802416
X=runif(N*Ngroups,1,100)
Y=rnorm(length(X), b0 + rep(b1,each=N)*X, sd=16)
dat=data.frame(Y=Y,X=X,
               Group=factor(rep(1:Ngroups,each=N)))
head(dat,2)
          Y        X Group
1  70.32508 24.96940     1
2 146.51905 65.44526     1

Example with simulated data

plot(X,Y, col=rep(rainbow(Ngroups), each=N), pch=20) # Color by group
abline(lm(Y~X), lty=1, lwd=2, col="black") # Best fit line ignoring groups
# best fit line within each group
for(i in 1:length(b1)) { 
  abline(lm(Y~X, data=dat[dat$Group==i,]), lty=2, col=rainbow(Ngroups)[i]) 
}

Example with simulated data

The default action is to estimate parameters via REML, which gives unbiased coefficient estimates.

  • Use REML=F (use ML) to fit models for comparison via AIC, etc.
# best fit lines for each group using a full mixed effects model
library(lme4) # Use ML when doing model comparison or variable selection! Otherwise, REML.
rfit = lmer(Y ~ X + (X | Group), REML=T, data=dat);rfit 
Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ X + (X | Group)
   Data: dat
REML criterion at convergence: 1200.681
Random effects:
 Groups   Name        Std.Dev. Corr
 Group    (Intercept)  1.0244      
          X            0.7363  1.00
 Residual             15.8266      
Number of obs: 140, groups:  Group, 7
Fixed Effects:
(Intercept)            X  
     49.432        2.282  
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings 

Note that (X | Group) term gives a random effect for slope and intercept!

Example with simulated data

library(lmerTest); # lmerTest adds hypothesis test info in the summary outputs
rfit1 = lmer(Y ~ X + (X-1 | Group), data=dat);  summary(rfit1) # first run library(lmerTest)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Y ~ X + (X - 1 | Group)
   Data: dat

REML criterion at convergence: 1200.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.81692 -0.60154  0.04225  0.67061  2.55511 

Random effects:
 Groups   Name Variance Std.Dev.
 Group    X      0.5631  0.7504 
 Residual      250.6692 15.8325 
Number of obs: 140, groups:  Group, 7

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  49.2571     2.7496 132.0413  17.914  < 2e-16 ***
X             2.2844     0.2875   6.2696   7.945 0.000169 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
  (Intr)
X -0.143

Example with simulated data: GLM context

We can run GLMs as you might expect, using glmer instead of glm. This should be equivalent to the previous code.

grfit = glmer(Y ~ X + (X-1 | Group), data=dat, family=gaussian);  summary(grfit) 
Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ X + (X - 1 | Group)
   Data: dat

REML criterion at convergence: 1200.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.81692 -0.60154  0.04225  0.67061  2.55511 

Random effects:
 Groups   Name Variance Std.Dev.
 Group    X      0.5631  0.7504 
 Residual      250.6692 15.8325 
Number of obs: 140, groups:  Group, 7

Fixed effects:
            Estimate Std. Error t value
(Intercept)  49.2571     2.7496  17.914
X             2.2844     0.2875   7.945

Correlation of Fixed Effects:
  (Intr)
X -0.143

Example with simulated data

isSingular(rfit1) # See ?isSingular for details. False is good. :)
[1] FALSE
# Here are the best fit parameters for each group...
coefficients(rfit1)$Group
  (Intercept)        X
1    49.25711 1.630928
2    49.25711 1.736566
3    49.25711 1.953450
4    49.25711 2.046965
5    49.25711 2.202513
6    49.25711 2.600950
7    49.25711 3.819426
# How do these compare to the data simulation (true) values?
cbind(True.Slopes=b1, Est.Slopes=coefficients(rfit1)$Group[,2])
     True.Slopes Est.Slopes
[1,]    1.595831   1.630928
[2,]    1.729731   1.736566
[3,]    1.900448   1.953450
[4,]    2.034316   2.046965
[5,]    2.285474   2.202513
[6,]    2.700080   2.600950
[7,]    3.802416   3.819426

Example with simulated data

plot(b1, coefficients(rfit1)$Group[,2], xlab="True Slopes for Groups", 
     ylab="Estimated Slopes for Groups"); 
  abline(a=0, b=1) # intercept 0, slope 

Example with simulated data

# Best fit line within each group: True vs estimated 
plot(X,Y, col=rep(rainbow(Ngroups), each=N), pch=20)
abline(lm(Y~X), lty=3, lwd=5, col="black")
for(i in 1:length(b1)) {
  abline(b0, b1[i], lty=2, lwd=2, col=rainbow(Ngroups)[i]);
  abline(unlist(coefficients(rfit1)$Group[i,]), lty=1, lwd=2, col=rainbow(Ngroups)[i]);
}
legend('topleft',c("True","Estimated","SLR"), lty=c(2,1,3),lwd=c(2,2,5))

Pig growth data example

Pfit = lmer(weight ~ week + (week | id), data=Pigs);  summary(Pfit) # library(lmerTest) to get p-vals
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: weight ~ week + (week | id)
   Data: Pigs

REML criterion at convergence: 1740.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6202 -0.5474  0.0150  0.5486  2.9939 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 id       (Intercept) 6.986    2.6432        
          week        0.380    0.6164   -0.06
 Residual             1.597    1.2637        
Number of obs: 432, groups:  id, 48

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept) 19.35561    0.40387 47.00004   47.93   <2e-16 ***
week         6.20990    0.09204 47.00077   67.47   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
week -0.132

Pig growth data example

isSingular(Pfit) # FALSE is good
[1] FALSE
coefficients(Pfit)
$id
   (Intercept)     week
1     19.59583 5.813491
2     17.76410 6.721253
3     15.81816 6.531738
4     21.33010 5.436096
5     20.66434 5.283964
6     18.20918 5.664767
7     16.75841 6.250396
8     18.21689 6.040443
9     16.16318 5.473554
10    20.51879 6.212529
11    19.56571 6.274587
12    18.61198 6.203627
13    18.00780 6.005597
14    18.41249 5.722986
15    16.61927 7.288605
16    15.51606 6.127593
17    26.11286 6.276210
18    22.65131 4.832620
19    24.63000 6.313625
20    20.78559 6.013639
21    20.72208 5.660488
22    22.65456 6.107961
23    20.11090 7.050579
24    20.78047 6.170029
25    15.63791 5.769765
26    15.28836 5.896836
27    17.99367 5.520279
28    16.93479 6.214271
29    19.56767 7.283895
30    16.68012 5.278910
31    17.19519 6.515998
32    17.25231 7.369767
33    19.43871 6.178544
34    18.80121 6.997048
35    17.77885 6.729328
36    20.50274 5.938423
37    18.83326 6.324742
38    19.39256 6.975793
39    19.00195 6.523202
40    18.26821 6.307325
41    19.34824 5.120790
42    19.45602 6.108424
43    19.37264 5.903588
44    22.06386 6.262233
45    25.83643 6.310636
46    20.27636 6.583958
47    22.86370 7.363327
48    21.06463 7.121539

attr(,"class")
[1] "coef.mer"

Pig growth data example: Best fit lines for each pig

matplot(1:9, matrix(Pigs$weight, ncol=48, nrow=9, byrow=F), type="p", 
        col=rep(rainbow(48)), pch=20, ylab="Weight", xlab="Week")
for(i in 1:48) {  abline(unlist(coefficients(Pfit)$id[i,]), lty=1, lwd=1, col=rainbow(48)[i]); 
    abline(lm(weight~week,data=Pigs[Pigs$id==i,])$coefficients, lty=2, lwd=1, col=rainbow(48)[i]); }
legend('topleft',c("Individually Estimated","LMER Estimated"), lty=c(2,1),lwd=c(2,2))

Pig growth data example: LMER vs Individually best fit line differences

plot(1:9, rep(0,9), ylim=c(-1,1), ylab="Weight", xlab="Week", type="n")
for(i in 1:48) {  # differences between best fit lines: LMER - indvidually fit.
  abline(unlist(coefficients(Pfit)$id[i,]) - lm(weight~week,data=Pigs[Pigs$id==i,])$coefficients, lty=1, lwd=1, col=rainbow(48)[i]); }

Summary for Linear Mixed Effects Models in R